Unreliable failure detectors for asynchronous distributed systems
نویسندگان
چکیده
Distributed computing is very attractive, but comes with new problems : information losses, overflow, or breakdowns. Most often, they are neglected. Indeed, it has been shown that the Consensus (a fundamental problem which requires that the processes agree on a common value) is unsolvable in a realistic computing model, i.e. completely asynchronous with possible crash failures [FLP85]. Intuitively, in an asynchronous environment, a process cannot decide if a component is either crashed or very slow. Several approaches were designed to “bypass” that impossibility. One of them is self-stabilization, studied at LaRIA, which deals with transient faults. The principle is to design algorithms which can be executed from any initial state, and eventually work according to its specification. Snap-stabilization is stronger : from any initial step, the algorithm always behaves according to its specification. The first snap-stabilized algorithms were designed at LaRIA. Another approach, which we are going to study, cope with definitive (crash) failures. Ideally, a black box should be attached to each process to indicate precisely the failures of the network. This black box is called a failure detector. But, the result of [FLP85] implies that it is impossible to implement such a perfect failure detector. That is why Chandra and Toueg introduces in [CHT96] the notion of unreliable failure detectors. Even if such detectors are still impossible to implement, practically, this approach allows to implement semi-algorithms. Theoretically, this approach also allows to introduce a hierarchy of the unreliable
منابع مشابه
About the Relationship between Election Problem and Failure Detector in Asynchronous Distributed Systems
This paper is about the relationship between Election problem and Failure Detector in asynchronous distributed systems. We first discuss the relationship between the Election problem and the Consensus problem in asynchronous distributed systems with unreliable failure detectors. Chandra and Toueg have stated that Consensus is solvable in asynchronous systems with unreliable failure detectors. B...
متن کاملOn the Respective Power of *P and *S to Solve One-Shot Agreement Problems
Unreliable failure detectors are abstract devices that, when added to asynchronous distributed systems, allow to solve distributed computing problems (e.g. Consensus) that otherwise would be impossible to solve in these systems. This paper focuses on two classes of failure detectors defined by Chandra and Toueg, namely, the classes denoted 3P (eventually perfect) and 3S (eventually strong). Bot...
متن کاملFast Asynchronous Uniform Consensus in Real-Time Distributed Systems
We investigate whether asynchronous computational models and asynchronous algorithms can be considered for designing real-time distributed fault-tolerant systems. A priori, the lack of bounded finite delays is antagonistic with timeliness requirements. We show how to circumvent this apparent contradiction, via the principle of “late binding” of a solution to some (partially) synchronous model. ...
متن کاملUnreliable Failure Detectors via Operational Semantics
The concept of unreliable failure detectors for reliable distributed systems was introduced by Chandra and Toueg as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem. In this paper, we provide a fresh look at failure detectors...
متن کاملImplementing unreliable failure detectors with unknown membership
Unreliable failure detectors [3] are useful devices to solve several fundamental problems in fault-tolerant distributed computing, like consensus or atomic broadcast. In their original work [3], Chandra and Toueg proposed 8 different classes of unreliable failure detectors, and showed that all of them can be used to solve consensus in a crash-prone asynchronous system with reliable links. All t...
متن کاملStubborn Communication Channels
This paper aims at bridging the gap between the assumption of reliable channels by fault-tolerant distributed algorithms and the weak reliability of feasible communication channels. We deene a new kind of communication channels which we call Stubborn channels. Stubborn channels are easily implementable over a connectionless network layer and, although weak, the reliability guarantees ooered by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003